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Abstract 

Multiple choice testing is a common but often ineffective method for evaluating learning. A newer approach, 
however, using Immediate Feedback Assessment Technique (IF AT®, Epstein Educational Enterprise, Inc.) forms, 
offers several advantages. In particular, a student learns immediately if his or her answer is correct and, in the 
case of an incorrect answer, has an opportunity to provide a second response and receive partial credit for a 
correct second attempt. For a multiple choice question with five possible answers, the IF AT® form covers spaces 
labeled A through E with a thin opaque film; when the film is scratched away, a star indicates the correct answer. 
This study was conducted in order to assess learning after an initial incorrect answer. Based on random chance, 
students should have mathematically a 25% chance of guessing a correct second answer (i.e. 1 of 4 remaining 
answers on the IF AT® form). Analysis of second responses for 8775 questions on IF AT® forms in 22 classes 
over 3 years showed that the percent of correct second answers was 44.9%, significantly higher than one might 
expect from random guessing. This indicates that students learned from an incorrect answer and, possibly by 
re-reading the problem, were able to demonstrate some level of mastery of the material. This data leads us to 
conclude that IF AT® forms are useful assessment tools. 
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1. Introduction 

Multiple-choice exams are advantageous with large class sizes, the desire to test frequently, and the desire to 
return corrected exams quickly to students. Multiple-choice exams, however, have several disadvantages for 
testing mastery of course material including having only one opportunity to respond to a question. The 
Immediate Feedback Assessment Technique (IF AT®) form, introduced by M. Epstein, B. Epstein and Brosvic 
(2001), attempted to circumvent these issues. When using the form, a student scratches off an opaque film 
corresponding to the answer to a multiple choice question and observes either a star for a correct answer or a 
blank for an incorrect answer; instructors have the codes for different IF AT® forms so as to write multiple choice 
questions where the correct answers correspond to the position of the stars on the IF AT® form. Thus, with the IF 
AT® form, a student receives an immediate response for correct answers; well-prepared students are particularly 
encouraged by this procedure (Dibattista & Gosse, 2006). But, if the area under the patch is blank (an incorrect 
answer), the student can immediately reconsider the question and give a second-chance answer, for which some 
partial credit may be awarded. More information about IF AT® forms can be found at the Epstein Educational 
Enterprise, Inc. web pages (http://www.epsteineducation.com/home/), and in an excellent overview by Smith 
(2013). 

Nicol and Macfarlane-Dick (2006) explore the importance of learning through assessment and explain the need 
for a shift in focus from simply assessing a student’s knowledge to a focus on a lifetime of learning. Fischer 
(1999) advocates that lifelong learning is fostered best utilizing self-directed learning environments. Fisher 
(1999) further explains that this self-directed learning should utilize authentic, complex problems and be 
embedded in a rewarding endeavor. For students to become self-directed learners they must monitor and adjust 
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their approaches to learning (Ambose, Bridges, DiPietro, Lovett, & Norman, 2010). The Immediate Feedback 
Assessment Technique allows for students to monitor and adjust their approach as they solve a problem (if they 
get their first answer wrong) and do so under the umbrella of a rewarding endeavor. Although students benefit by 
knowing immediately whether a mistake has been made in answering a question and by having the opportunity 
to receive partial credit for a correct second answer (Epstein et al., 2002; Dihoff, Brosvic, Epstein, & Cook, 
2004), it is less clear if a correct second answer is indeed the result of learning from a mistake or good guessing. 
Which leads to our primary research problems: How accurately can students respond to a question in which they 
have already responded incorrectly? Can we ascertain whether students, when given a second-chance, are merely 
guessing or are utilizing the learning triggered by the knowledge of an incorrect answer? In a problem-solving 
situation, immediate knowledge of an incorrect response to a problem should trigger student’s reappraisal of the 
information explicitly and implicitly embedded in the question. This ideally should result in a cognitive 
reevaluation of the question, assisting the student to find a more plausible answer to the same question on the 
second attempt. Whether the credits earned through the second attempts are attributed to learning from mistakes 
or random guessing needs an answer. Starting with the assumption that informed second chance responses would 
gain more partial credit points than random-chance-guess responses, we attempted to empirically examine 
whether the students additional credits earned in the present study were significantly greater than the points 
students might have earned from utilizing blind-guessing for the second chance responses. This statistical study 
was taken on to assess learning after an initial incorrect answer using the IF AT® format. 

2. Methods 

IF AT® forms were purchased from Epstein Educational Enterprise, Inc. (Cincinnati, OH, 
http://www.epsteineducation.com/home/). Of several types of forms, this study used forms with 5 possible 
answers. Different versions of the forms come with instructor-only answer keys to construct tests with correct 
answers in the correct positions. New forms have come with opaque backing to insure even greater resistance to 
any possible “see through”. 

The study was conducted over a 3-year period in second-year Organic Chemistry courses. A total of 22 classes 
used the IF AT® forms for three hourly exams and a final given to a total of 1,449 students. The multiple-choice 
portion of each test consisted of between 15 and 25 questions with 5-part answers. There were a total of 26,175 
questions and 8,775 questions were incorrectly answered on the first attempt; the latter were evaluated for 
correct second attempts. 

P-Values were obtained in order to determine the percentage of students that utilized the opportunity for partial 
credit at a greater than random chance frequency. These p-values were obtained for each individual test as well 
as utilizing the combination of all tests into a large pool. Each p-value was obtained by taking the difference of 
the normal distribution and the value of 1.0. 

Further analysis was conducted through the use of a t -test in order to determine whether there was any 
significance in the difference between the total amount of partial credit actually earned throughout the study 
versus the total amount of partial credit that might have been earned had students only earned partial credit 
through random chance. Finally, a Pearson’s Correlation was conducted to show the strength between the two 
variables of a student who scored with partial credit and the amount of partial credit that student earned. 

3. Results 

Using the 5-answer IF AT® forms, the likelihood of a student getting a question correct on the second try by 
random guessing would be 25% (1 correct answer/4 remaining choices). In the current study with 8,775 such 
questions, one could determine that random guessing would give a correct answer 2194 times (See Figure 1). In 
reality, students answered correctly on a second attempt a total of 3938 times or 44.9% of the time. While it is 
not possible to rule out that guessing had any effect on this number, our data suggests that students utilized some 
technique and prior knowledge to decipher the correct answers. It is important however to note that our data is 
only indicative of learning and not a confirmation that second-chance correct responses are learning-driven. 
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Total Correct 1st Wrong 1st Correct 2nd Correct 2nd Missed 2nd Missed 2nd 
Attempt Attempt Attempt Attempt Attempt Attempt 

Answers Answers that Answers Answers that 


Calculated Actually Calculated Actually 

Through Occurred Through Occurred 



Figure 1. Rate of accurate of response actually obtained versus random chance 


A t -test was run to determine the level of difference between the total partial credit actually earned (N =8775, 
Mean=l79.00, SD=46.18, 0=1.27) and the value for 25% of the total possible credit that would hypothetically 
be obtained through random guessing (N=#775, Mean=99.72, SD=20.53, 0=0.56. The t -test returned a value of 
9.08 x 10" 12 (N=8775, SD=53.44, 0=1.47). This value suggests that students were utilizing some form of 
discernment to determine which of their remaining possibilities was the correct choice. Of the 22 tests, the 
greatest value for p-hat obtained was 4.117 x 10" 8 . For this specific test (referred to as exam F), there were 15 
multiple-choice questions; students were given the opportunity to earn 3 points for a correct first answer, 1 point 
for a correct second answer, or zero points for two incorrect attempts. That would result in a total possible score 
of 45 points. Exam F had an average score without partial credit of 21.34 and a median score without partial 
credit of 21. When partial credit was included for Exam F the average score was 24.30 and the median score was 
25.00. The close proximity between the average and the median suggests that there are no dramatic outliers 
affecting the calculations. The average test grade without the addition of any partial credit was 47.42%. The 
average test grade after the addition of partial credit was 53.99%. This alone brought the average grade up by 
nearly 6.57%, which is over half of a letter grade for the multiple-choice portion of the exam (See Figure 2 for a 
comparison of average test scores without partial credit, with partial credit obtained at random chance, and with 
the partial credit that was actually obtained for each exam). 
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Comparison of Average Test Grades by Type of 
Partial Credit Earned 

(No Partial Credit vs. Random Chance vs. Actual) 



ABCDEFGH I J KLMNOPQRSTUV 

HTest Grade Without Partial Credit HTest Grade with Partial Credit at Random Chance 

HTest Grade with Partial Credit Actually Earned 


Figure 2. Comparison of average test percentage 


A t -test was used to determine the level of significance between the scores students earned with the partial credit 
they actually obtained (N=44, Mean=24.30, SD=7.75, CI=3.15) versus the scores they would have theoretically 
earned had they only received partial credit at a random chance rate of 25% (N=44, Mean=23.31, SD=7.51, 
0=3.05). A significance level of 0.01 was used to determine if there was a significant difference between the 
scores students earned with their actual partial credit and the scores student would have earned under random 
chance. The t -test for exam F (which, while still being significant, was the least significant of the 22 tests) 
displayed a value of <0.001 (N=44, SD=7.60, 0=3.09). This value strongly suggests that students were able to 
obtain partial credit and earn a higher grade at a statistically significant level. 

The results for the Pearson’s Correlation was -0.1703, meaning that there is a weak negative correlation between 
the grade a student earned and the amount of partial credit that was obtained by that student. This was expected 
due to the understanding that to get a perfect score a student would have to earn zero partial credit and get every 
answer correct. Therefore, the higher grades that could be obtained would require a lower amount of incorrect 
initial choices. 

Looking at our complete data set, the amount of partial credit points that would have been earned by guessing 
was 2194. In reality, students earned a total of 3938 partial credit points, resulting in an additional earning of 
nearly 80% more partial credit points than mere random guessing. 

An analysis was completed between the average exam scores without partial credit, with partial credit obtained 
at random chance, and with the partial credit that was actually obtained for each exam (See Figure 2 for a visual 
comparison of this analysis). This analysis shows that the average grades obtained in relation to no partial credit, 
partial credit at random chance, and the actual partial credit earned for exams F and Q are extremely close to one 
another, with exam F displaying 21.34, 24.30 and 23.31 respectively; while exam Q received average exam 
grades of 21.17, 24.58, and 23.16 respectively. While these differences are small, and while both exams had 
partial credit use in statistically significant excess when compared to that of random chance, a t-test for exam Q 
resulted in a significance of 2.3 x 10" 11 (N=52, SD=6.22, CI=2.31). 

4. Discussion and Conclusion 

The goals of teaching are to facilitate learning, to increase retention of knowledge and to expand understanding. 
Ideally, testing should be designed to assess learning. In today’s pedagogy, established methods of lecturing and 
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testing are being challenged with new approaches aimed at increasing students’ understanding of course material. 
Methods that permit immediate feedback to students during lectures and tests have been shown to increase more 
effective long-term understanding (Roediger & Butler, 2011). Classroom response systems, for example, have 
gained considerable acceptance in engaging students during lectures in large classes (Schell, Lukoff, & Mazur, 
2013; Heaslip, Donovan, & Cullen, 2014). In addition, continual “retrieval practice” enhanced with rapid 
feedback has been shown to assist persistent learning (Roediger & Butler, 2011). Clearly, rapid feedback is a 
critical element of all testing to ensure that students’ mistakes do not persist (Attali, 2011). 

Designing good tests is a challenge for instructors. With increasing class sizes or the desire to present multiple 
exam opportunities, multiple-choice questions are often used to speed up grading. Little, E. Bjork, R. Bjork and 
Angello (2012) note that multiple-choice tests can be useful learning tools that foster productive retrieval 
learning. Mathematical analysis of multiple choice tests shows that the fairest method for grading is to give 
credit for the number of correct answers with no penalty for a wrong or missing answer (Scharf & Baldwin, 
2007); this approach, however, does not reduce credit given for blindly guessing an answer (Bush, 2015). In 
most cases, multiple choice questions pose two critical problems: one, students must choose from a fixed set of 
answers without displaying their learning process; and, two, common to all testing methods, there is a delay in 
learning whether the answer is correct or not. 

Epstein et al. designed IF AT® forms to overcome some of these issues (Epstein et al., 2001; Epstein et al., 2002; 
Dihoff, Brosvic, & Epstein, 2003; Brosvic, Epstein, Cook, & Dihoff, 2005). For the professor, the immediate 
feedback assessment technique offers a benefit in that the grading of the multiple-choice portion of an exam is 
nearly completed during the actual testing process. The grading of these exams consists of a student writing the 
amount of points they received for each question to the side of the question after they complete the exam. In 
addition, many researchers have found that, when compared to other testing methods (traditional format, 
end-of-test feedback, delayed feedback (feedback given 24 hours after testing), and Scantron testing formats), 
the IF AT® provided better recall of material when asked again on a future, final exam (Epstein & Brosvic, 2002; 
Dihoff et al., 2003; Dihoff et al., 2004). 

IF AT® forms can be used to give partial credit for a correct answer after an initial incorrect response, and some 
faculty use IF AT® forms to allow students to answer until he/she reaches a correct answer (Attali, 2011). One 
problem with giving partial credit is promoting guessing. Slepkov (2013) suggested that second answers on IF 
AT® forms were better than guessing. Our analysis suggests that Slepkov’s assertion is valid. We found that the 
percent of correct second answers was 44.9%, almost 25% higher than random guessing. With rare exceptions, 
we and others (Dibattista & Gosse, 2006; Epstein & Brosvic, 2002) have found that students liked using the IF 
AT® forms and reported informally that it reflected what they learned better than standard multiple-choice 
formats. Therefore we can conclude that IF AT® forms are an effective and practical assessment tool and 
encourage more broad adoption by educators. More specifically, we encourage science/chemistry faculty who, in 
the author’s experience, typically shy away from multiple-choice assessments because of the lack of ability to 
award partial-credit points. Finally, we would like to caution readers that, while our data is strongly indicative 
that our students’ second-chance responses were grounded in learning, it is important that future research make 
efforts to probe the nature of learning triggered through immediate feedback in a testing situation. 
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